Search CORE

412 research outputs found

Making Digital Artifacts on the Web Verifiable and Reliable

Author: Dumontier Michel
Kuhn Tobias
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

The current Web has no general mechanisms to make digital artifacts --- such as datasets, code, texts, and images --- verifiable and permanent. For digital artifacts that are supposed to be immutable, there is moreover no commonly accepted method to enforce this immutability. These shortcomings have a serious negative impact on the ability to reproduce the results of processes that rely on Web resources, which in turn heavily impacts areas such as science where reproducibility is important. To solve this problem, we propose trusty URIs containing cryptographic hash values. We show how trusty URIs can be used for the verification of digital artifacts, in a manner that is independent of the serialization format in the case of structured data files such as nanopublications. We demonstrate how the contents of these files become immutable, including dependencies to external digital artifacts and thereby extending the range of verifiability to the entire reference tree. Our approach sticks to the core principles of the Web, namely openness and decentralized architecture, and is fully compatible with existing standards and protocols. Evaluation of our reference implementations shows that these design goals are indeed accomplished by our approach, and that it remains practical even for very large files.Comment: Extended version of conference paper: arXiv:1401.577

arXiv.org e-Print Archive

Maastricht University Research Portal

VU Research Portal

Performance of the Charniak-Lease parser on biological text using different training corpora

Author: Alison V. Callahan
Michel Dumontier
Publication venue
Publication date: 01/01/2008
Field of study

POS tagging is used as the first step in many NLP workflows, although the accuracy of tag assignment frequently goes unchecked. We hypothesize that changing the training corpora for a parser will affect its POS tagging of a target corpus. To this end we train the Charniak-Lease parser on the WSJ corpus and two biomedical corpora and evaluate its output to MedPost, a POS tagger with a reported 97% accuracy on biomedical text. Our findings indicate that using biomedical training corpora significantly improves performance, but that minor differences in the biomedical training corpora have a significant effect on the correctness of POS tagging. Specifically, the tagging of hyphenated words and verbs was affected. This work suggests that the choice of training corpora is crucial to domain targeted NLP analysis

CiteSeerX

Nature Precedings

A web API ecosystem through feature based reuse

Author: Dumontier Michel
Verborgh Ruben
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

The fast-growing web API landscape brings clients more options than ever before-in theory. In practice, they cannot easily switch between different providers offering similar functionality. We discuss a vision for developing web APIs based on reuse of interface parts called features. Through the introduction of five design principles, we investigate the impact of feature-based reuse on web APIs. Applying these principles enables a granular reuse of client and server code, documentation, and tools. Together, they can foster a measurable ecosystem with cross-API compatibility, opening the door to a more flexible generation of web clients

arXiv.org e-Print Archive

Maastricht University Research Portal

Ghent University Academic Bibliography

A web API ecosystem through feature based reuse

Author: Dumontier Michel
Verborgh Ruben
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Maastricht University Research Portal

Ghent University Academic Bibliography

Advancing discovery science with fair data stewardship:Findable, accessible, interoperable, reusable

Author: Dumontier Michel
Wesley Kathryn
Publication venue
Publication date: 31/05/2018
Field of study

This report summarizes a presentation by Dr. Michel Dumontier. It reviews innovative scientific research methods created by data science, and the need to develop infrastructure, methodologies, and user communities to advance data science. Stakeholders have proposed a set of principles to make digital resources findable, accessible, interoperable, and reusable—FAIR. FAIR principles provide guidelines, do not require specific technologies, and allow communities of stakeholders to define specific FAIR standards and develop metrics to quantify them. Libraries can be part of the new data ecosystemby providing education, data stewardship, and infrastructure

Maastricht University Research Portal

A Web API ecosystem through feature-based reuse

Author: Dumontier Michel
Verborgh Ruben
Publication venue
Publication date: 22/09/2016
Field of study

The current Web API landscape does not scale well: every API requires its own hardcoded clients in an unusually short-lived, tightly coupled relationship of highly subjective quality. This directly leads to inflated development costs, and prevents the design of a more intelligent generation of clients that provide cross-API compatibility. We introduce 5 principles to establish an ecosystem in which Web APIs consist of modular interface features with shared semantics, whose implementations can be reused by clients and servers across domains and over time. Web APIs and their features should be measured for effectiveness in a task-driven way. This enables an objective and quantifiable discourse on the appropriateness of a certain interface design for certain scenarios, and shifts the focus from creating interfaces for the short term to empowering clients in the long term

Maastricht University Research Portal

Putting FAIR Evidence into Practice

Author: Dumontier Michel
Leung Tiffany I.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/08/2019
Field of study

Maastricht University Research Portal

NBLAST: a cluster variant of BLAST for NxN comparisons

Author: Dumontier Michel
Hogue Christopher WV
Publication venue: BioMed Central
Publication date: 01/01/2002
Field of study

BACKGROUND: The BLAST algorithm compares biological sequences to one another in order to determine shared motifs and common ancestry. However, the comparison of all non-redundant (NR) sequences against all other NR sequences is a computationally intensive task. We developed NBLAST as a cluster computer implementation of the BLAST family of sequence comparison programs for the purpose of generating pre-computed BLAST alignments and neighbour lists of NR sequences. RESULTS: NBLAST performs the heuristic BLAST algorithm and generates an exhaustive database of alignments, but it only computes [Image: see text] alignments (i.e. the upper triangle) of a possible N(2) alignments, where N is the set of all sequences to be compared. A task-partitioning algorithm allows for cluster computing across all cluster nodes and the NBLAST master process produces a BLAST sequence alignment database and a list of sequence neighbours for each sequence record. The resulting sequence alignment and neighbour databases are used to serve the SeqHound query system through a C/C++ and PERL Application Programming Interface (API). CONCLUSIONS: NBLAST offers a local alternative to the NCBI's remote Entrez system for pre-computed BLAST alignments and neighbour queries. On our 216-processor 450 MHz PIII cluster, NBLAST requires ~24 hrs to compute neighbours for 850000 proteins currently in the non-redundant protein database

University of Toronto Research Repository

Maastricht University Research Portal

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central